feat: external MCP connectors via AgentCore Identity#174
feat: external MCP connectors via AgentCore Identity#174philmerrell merged 35 commits intodevelopfrom
Conversation
…middleware First phase of the Connectors refactor, which will eventually replace the bespoke OAuth token store (OAuthTokenRepository, KMS-encrypted DynamoDB, Secrets Manager client credentials, manual refresh) with AgentCore Identity's managed token vault and credential providers. - AgentCoreContextMiddleware copies the four Runtime headers (WorkloadAccessToken, OAuth2CallbackUrl, session ID, request ID) into BedrockAgentCoreContext on every invocation. Required because the Inference API is a plain FastAPI app rather than BedrockAgentCoreApp, so the SDK does not populate the context for us. No-op when headers are absent, so local development and unit tests continue to work without mocks. - AgentCoreIdentityClient wraps IdentityClient.get_token() with a narrower, platform-friendly surface for USER_FEDERATION (3LO) flows. Surfaces the "user consent required" case as a structured TokenResult(authorization_url=...) rather than an exception, so it can flow through the existing SSE stream as a new event type in a later phase. Both modules are pure additions; no existing code path calls them yet. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wires the Runtime context middleware into the Inference API and swaps the external MCP client's token source from the bespoke OAuthService to AgentCore Identity's USER_FEDERATION flow. - main.py: installs AgentCoreContextMiddleware so WorkloadAccessToken and OAuth2CallbackUrl Runtime headers populate BedrockAgentCoreContext on every invocation. - external_mcp_client.py: _get_oauth_token now returns a TokenResult from AgentCoreIdentityClient instead of a decrypted token string from OAuthService. Scopes are read from the platform's OAuth provider record so organizations can change them without code. When the SDK signals that user consent is required, the authorization URL is stashed per-user for the inference route to surface via an oauth_required SSE event (emitter to follow in a subsequent commit). load_external_tools skips client creation on consent-required rather than creating a client that would fail at the first request. - Convention: the platform's provider_id is used verbatim as the AgentCore Identity credential-provider name. Admins register matching names via CreateOauth2CredentialProvider during provider setup. The OAuthService, token vault, and encryption layer are still referenced by unrelated code paths (admin routes, connections UI) and will be removed in Phase 3 once the AgentCore-backed flow is validated end-to-end. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Rebrand the user-facing OAuth UI from "connections" to "connectors" for consistent vernacular across the product. Folders, classes, types, and route paths all follow the new name; the /settings/connections URL redirects to /settings/connectors. The backend /oauth/connections endpoint is preserved as a stable contract and translated at the service layer. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wraps bedrock-agentcore-control for admin-side OAuth2 credential provider CRUD: create/update/delete/get with vendor mapping (Google/Microsoft/GitHub to their native vendors; Canvas/Custom routed through CustomOauth2 via an OIDC discovery URL or explicit authorization-server metadata). Domain errors map 404/conflict/invalid-custom to typed exceptions so route handlers can translate cleanly. Update is intentionally non-partial: AgentCore's UpdateOauth2CredentialProvider requires a full oauth2ProviderConfigInput and Get never returns the stored client_secret, so credential rotation always re-submits both clientId and clientSecret. 17 unit tests cover every vendor path, error mapping, and the Custom-only discovery rule. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds Create/Update/Delete/Get/List on bedrock-agentcore OAuth2 credential providers to the app-api task role, scoped to the default token vault. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Deletes the legacy 3LO dance that predates AgentCore Identity — the per-user token vault, PKCE-based authorization service, encryption layer, token cache, user-facing /oauth/* routes, and the tool-side OAuthToolService. AgentCore Identity owns the token vault and consent flow now; the inference path already routes through agentcore_identity.py via the recent external MCP client refactor, so these modules had no live consumers. Also slims shared/oauth/__init__.py to the surviving surface (provider model, repository, registrar) and unwires the user-facing router from app_api/main.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AgentCore Identity owns the clientId, clientSecret, endpoint config, and callback URL. Our DynamoDB record keeps only the admin metadata (display name, scopes, role gates, icon) plus cached pointers to AgentCore's record (credential_provider_arn, callback_url) for convenience. Drops authorization_endpoint, token_endpoint, authorization_params, userinfo_endpoint, revocation_endpoint, pkce_required, OAuthUserToken, and the user-side connection DTOs — all artifacts of the retired in-house flow. Adds oauth_discovery_url and authorization_server_metadata for Custom/Canvas providers, gated by a pydantic validator. Repository surface tightens to put_provider + apply_metadata_update; the Secrets Manager write/read path is gone. Admin routes (commit next) own the AgentCore round-trip and hand a fully-formed record to the repo. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
POST now calls the registrar first and, on success, upserts the metadata record in DynamoDB. If the DB write fails after AgentCore has accepted the credentials, we best-effort delete the AgentCore provider to avoid orphans. PATCH distinguishes metadata-only edits (scopes, roles, display name, icon, enabled) from credential rotation. Rotation requires clientId + clientSecret together — partial updates are rejected by AgentCore's UpdateOauth2CredentialProvider contract. DELETE removes the AgentCore provider first (which revokes every user token stored in its vault), then the local record. Pre-existing connection- count checks are dropped since per-user tokens no longer live in our DB. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Admin side: - Rename admin/oauth-providers → admin/connectors (file + route); old route path redirects for URL stability - Rewrite the admin model to the AgentCore-owned shape: drop endpoint fields, authorization_params, pkce_required, userinfo/revocation endpoints. Add credential_provider_arn, callback_url, and oauth_discovery_url / authorization_server_metadata for Custom vendors - Rewrite the admin form: preset picker simplified to display metadata only, Custom requires an OIDC discovery URL, credential rotation requires clientId + clientSecret together (AgentCore's update API is not partial), success screen after create displays the AgentCore callback URL with a copy button so the admin can paste it into the vendor console, edit mode shows the callback URL + ARN read-only User-facing retirement: - Delete settings/connectors (user "my connected accounts" page), settings/oauth-callback (legacy 3LO return handler), and the sidebar + route entries for them. AgentCore Identity owns the consent flow at runtime via the existing /oauth-complete landing page Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When an external MCP tool needs OAuth consent, AgentCore Identity returns an authorization URL instead of a token. This wires that signal all the way to the user: Backend: - Inference route drains pending consent URLs from the external MCP integration after the agent stream finishes and emits one oauth_required SSE event per provider before done - IAM grants bedrock-agentcore:GetResourceOauth2Token on the runtime role so the AgentCore Identity client can reach the token vault - CLAUDE.MD + SSE_ERROR_MESSAGING.md document the new event Frontend: - Stream parser recognizes oauth_required and surfaces it as an OAuthRequiredEvent - New /oauth-complete landing page handles the AgentCore callback redirect and postMessages consent completion to the opener tab - OAuthConsentService orchestrates popup opening + postMessage receipt - OAuthConsentBanner renders the Connect button inside the chat input - chat-http and assistant preview pass OAuth2CallbackUrl header so AgentCore Runtime knows where to return after consent Also updates the admin Tool form reference from /admin/oauth-providers to /admin/connectors to match the renamed admin surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…izer Adds the Settings → Connectors page so users can browse and connect OAuth-backed external tools end-to-end: - New /connectors routers on app-api (list user-visible providers via RBAC) and inference-api (initiate-consent, complete-consent) — the inference-api side runs under the AgentCore Runtime proxy where the WorkloadAccessToken context is populated. - AgentCoreIdentityClient gains a workload-token mint fallback for local dev (GetWorkloadAccessTokenForUserId) and appends provider_id to the callback URL so the landing page can dismiss the right banner. - /oauth-complete page POSTs CompleteResourceTokenAuth back through the inference-api before notifying the opener, fixing the "consent finished but vault stayed empty" race. Uses BroadcastChannel to bridge popup → opener under Chrome's COOP isolation. - New connectors settings page with a Connect / Reconnect affordance per provider, wired to the OAuthConsentService popup flow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… interrupts
The agent used to pre-flight OAuth at tool-load time and abort the whole
turn if any provider needed consent — the user then had to retype the
prompt after authorizing. This switches to the Strands interrupt
protocol: the consent gate runs lazily before each tool call, pauses
the in-flight turn, and resumes it automatically once the user
finishes the popup.
Backend
- New OAuthConsentHook (BeforeToolCallEvent + AfterToolCallEvent).
- BeforeToolCall: looks up the OAuth provider for the selected
MCPAgentTool's MCPClient (no name coupling), checks the in-process
token cache, and either lets the tool run or calls
event.interrupt(...) with the consent URL when AgentCore Identity
reports consent required.
- AfterToolCall: detects 401-style failures from MCP tool results,
marks the (user, provider) for force_authentication on the next
fetch, and sets event.retry = True so the BeforeToolCall hook
re-fires and triggers a fresh consent. Closes the gap where a
provider-side revocation leaves a stale token in AgentCore's vault.
- New oauth_token_cache: per-(user, provider) tokens + force-reauth
flags; lifecycle-managed by the hook.
- ExternalMCPIntegration always loads MCP clients with a lazy
token_provider that reads from the cache; the pending_consent /
drain_pending_consent dict and the route's pre-LLM short-circuit
branch are gone.
- StreamCoordinator emits one oauth_required SSE event per pending
interrupt before the final done event, carrying interruptId so the
frontend can resume the same turn.
- ChatAgent.stream_async accepts interrupt_responses and forwards them
to Strands as the resume prompt; route accepts the same on
/invocations and skips quota + RAG augmentation on resume.
Frontend
- OAuthRequiredEvent type + validator gain interruptId; settings-page
consent path makes interruptId optional (no agent turn to resume).
- OAuthConsentService tracks the interruptId per request and invokes a
registered resume handler on broadcast success.
- ChatRequestService snapshots the last turn's payload and replays it
with interrupt_responses attached when a consent completes — the
user never retypes the prompt.
Smoke-tested end-to-end: Google revoke → whoami → 401 → AfterToolCall
detects + retries → fresh consent banner → popup → auto-resume → tool
returns greeting in the same turn.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| logger.error( | ||
| "CompleteResourceTokenAuth failed for user=%s provider=%s: %s", | ||
| current_user.user_id, | ||
| body.provider_id, |
Code ReviewLarge structural rewrite (+5,125 / −5,781 across 87 files) that replaces the in‑house OAuth flow with AWS Bedrock AgentCore Identity and moves OAuth consent from a pre‑flight gate to mid‑turn Strands interrupts. Architecture is cleaner (AgentCore owns secrets/vault; our DB owns metadata), the resume‑without‑retyping UX is a nice win, and tests look complete. A handful of real issues should be addressed before merging. Critical Issues
Suggestions
What Looks Good
VerdictRequest Changes — merge‑blocking on items #1 (verify CI: 🤖 Generated with Claude Code |
…uth-failure regex Hardens two gaps called out in review of the AgentCore OAuth flow. - `/connectors/complete-consent` now verifies the submitted `session_uri` was issued to the authenticated user at `initiate_consent`, rejecting cross-user replay with 403 before ever calling AgentCore. Backed by a thread-safe TTL cache (10 min, single-use). Soft-fails with a warning when AgentCore's authorize URL doesn't carry a recognised session parameter, so an SDK shape change logs rather than blocks. - `_AUTH_FAILURE_PATTERN` tightened with word boundaries on every clause and a non-path guard on `401` so tool errors containing `/v1/401/...` no longer trigger a spurious force-reauth. Also moves `import boto3`/`os` out of the `complete_consent` handler body and caches the control-plane client via `lru_cache`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…back Addresses the remaining two critical items from PR #174 review. Registrar response parsing (`_info_from_response`): fails loudly on contract violations rather than silently storing empty strings. Missing `clientSecretArn` still tolerated (some vendors won't persist one) but a wrong-shape `clientSecretArn` or absent `credentialProviderArn` now raises TypeError so an AgentCore API change surfaces as a real error. Admin create-provider rollback (`_rollback_orphaned_provider`): now retries the AgentCore delete twice with backoff before giving up. On exhaustion, emits a CloudWatch `Agentcore/OAuth::ProviderOrphaned` custom metric so ops can alarm on stranded credential providers. Secondary failures (CW down, registrar down after retries) never shadow the admin's original 5xx — they only log. The subsequent create attempt that hits `CredentialProviderConflictError` with no DB record now returns an actionable 409 pointing at the AWS CLI cleanup command instead of a bare "already exists". App API task role grants `cloudwatch:PutMetricData` scoped to the `Agentcore/OAuth` namespace via a condition key. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Reject non-https authorizationUrls at both intake and open time so a compromised backend can't smuggle javascript:/data: URIs into a user click. - Replace window.location.href hijack on popup-block with a blocked signal; the banner renders an "Open in new tab" anchor instead of tearing down the chat tab. - Reject resume requests whose interruptIds aren't present in the cached agent's _interrupt_state with 400, preventing silent acceptance after cache eviction, process restart, or forged payloads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CodeQL flagged the provider_id interpolation as clear-text logging of sensitive data — its taint analysis traces provider_id back through the OAuth credential path. The provider ID itself isn't secret, but the log line doesn't need it: tool_id already identifies the tool, and "(OAuth)" alone confirms auth was wired up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… tests Both tests codify behavior that commit b55653d intentionally retired: - TestInvocationsOAuthRequired exercised drain_pending_consent and the route-level oauth_required emission path. That path is gone — consent URLs now flow through Strands' _interrupt_state inside agent.stream_async (stream_coordinator.py:543), and the hook behavior is covered by tests/agents/main_agent/session/test_oauth_consent_hook.py. - test_missing_message_returns_422 expected message to be required, but InvocationRequest.message is now default "" so resume requests can reuse the original prompt from interrupt context. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…1 detection Fixes the constant Google re-auth bug: the consent hook was calling AgentCore Identity with `callback_url=None` whenever the inference API ran outside the Runtime proxy (every local-dev session). AgentCore then issued an authorize URL whose redirect went somewhere other than `/oauth-complete`, so consent never finalized and every request looped back through the consent flow. Adds a `CallbackUrlUnavailableError` and an `AGENTCORE_LOCAL_OAUTH_CALLBACK_URL` env-var fallback in `_resolve_callback_url`, so the failure mode is now loud instead of silent. Both the chat-triggered consent hook and the settings-page `initiate-consent` route catch it and return 503 with actionable guidance. Also tightens the OAuth 401 detection regex to reduce false-positive re-auth prompts: `\bunauthorized\b` now requires proximity to an HTTP/status/code keyword (previously matched prose like "unauthorized to view this calendar"), and adds high-confidence signals for OAuth `invalid_grant` (refresh-token revocation) and Google's `UNAUTHENTICATED` status / `invalid authentication credentials` message. Drops the in-process `session_cache` defence-in-depth on `complete-consent`: AgentCore's own `userIdentifier` ↔ `sessionUri` binding already rejects mismatched completions, and the local cache cost real operational pain (multi-worker / restart / `--reload` would break legitimate consent flows with a confusing 403). Trust the JWT-derived `current_user` plus AgentCore's binding instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Several user-facing connector improvements that share a foundation
(per-user `force_reauth` lifecycle in the in-process token cache):
- New `GET /connectors/{id}/status`: side-effect-free read that the
settings page uses to render a "Connected" badge without committing
the user to a consent flow (initiate-consent always triggers a
server-side pending session). Honors the `force_reauth` flag — a
just-disconnected user is reported as not connected even if the vault
still holds an unexpired token.
- New `DELETE /connectors/{id}/connection`: best-effort disconnect that
flips the local `force_reauth` flag (AgentCore exposes no per-user
vault-delete API). The next status check returns `connected: false`,
the next initiate-consent passes `force_authentication=True`, and the
user re-authorizes from scratch. complete-consent clears the flag on
success so the UI flips back to connected without waiting on the agent
loop to warm the cache.
- Frontend Disconnect button on connected rows. Confirmation dialog uses
the existing `ConfirmationDialogComponent` (CDK Dialog, destructive
styling) — also swapped the admin connector-list delete from native
`confirm()` to the same component for visual consistency.
- Closed-popup recovery in `OAuthConsentService`: poll `popup.closed`
after open and drop the provider from `inFlight` if the user dismisses
without completing consent. The pending request stays so the chat
banner re-offers Connect; the settings page resets `awaiting` →
`idle` via the new `inFlightProviders` signal.
- Settings page: loading skeleton in the row's action area while the
status probe resolves, dropped the misleading "Reconnect" button
(clicking it just hit `initiate-consent` and toasted "already
connected"), and removed the scope-list display under each connector.
- Forward Google's `access_type=offline` (per AgentCore Identity docs)
via a new vendor-baseline helper, plumbed through both the
chat-triggered consent hook and the settings/initiate-consent /
status routes via two new optional lookups on `OAuthConsentHook`
(`provider_type_lookup`, `custom_parameters_lookup`). Without this
Google issues a 1-hour access token with no refresh path and the
vault entry becomes unrefreshable.
- Admin-configurable `custom_parameters` field on the OAuth provider
record (DynamoDB `customParameters` map, Pydantic Create/Update/
Response, admin form `key=value` textarea with parse/serialize
helpers). Merged with the vendor baseline at request time — baseline
wins on conflict so admins cannot accidentally turn off documented
vendor requirements.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…aceholders
Per the AgentCore Identity supported-providers docs, Slack, Salesforce,
and Zoom are first-class vendors with pre-configured endpoints — admins
only need to supply credentials. Verified the exact `credentialProviderVendor`
strings and `oauth2ProviderConfigInput` keys against the SDK shape
(`Oauth2ProviderConfigInput.members`):
- Slack → SlackOauth2 / slackOauth2ProviderConfig
- Salesforce → SalesforceOauth2 / salesforceOauth2ProviderConfig
- Zoom → ZoomOauth2 / includedOauth2ProviderConfig
(shared key for simpler vendors)
Backend additions: `SLACK`, `SALESFORCE`, `ZOOM` on `OAuthProviderType`;
vendor + config-key entries on the registrar. The existing discovery-URL
guard correctly rejects discovery URLs for these new types.
Frontend additions: matching `ConnectorType` literals; preset entries
with sensible default scopes and vendor-relevant placeholder hints (e.g.
Salesforce `api, refresh_token, offline_access, id, openid`); icon
class branches for the new tiles (Slack fuchsia + chat bubble,
Salesforce sky + cloud, Zoom blue + video camera).
Form polish:
- `scopesPlaceholder` / `customParametersPlaceholder` on each preset.
Form binds them via computed signals so the hints update as the admin
switches between providers.
- Selecting a preset seeds `customParameters` only when the preset
declares `defaultCustomParameters` — avoids clobbering user-typed
content for presets that have only a hint.
- Dropped the Google `defaultScopes`. The OIDC-only
`openid email profile` set doesn't actually let an agent do anything
useful with Google APIs (Calendar/Gmail/Drive each need different
scopes), so the form lands empty and the placeholder shows the URL
format as a hint.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| logger.info( | ||
| "Completed OAuth consent for user=%s provider=%s", | ||
| current_user.user_id, | ||
| body.provider_id, |
…store
Replaces the floating OAuth banner with an inline prompt anchored to the
assistant turn that triggered consent, and persists pending interrupts to
session metadata so a browser refresh rediscovers them instead of leaving
the tool call orphaned in `pending` forever.
Backend
- New `PendingInterrupt` model on `apis.shared.sessions.models`; included
on `MessagesListResponse` and `SessionMetadata`.
- `metadata.add_pending_interrupt` / `remove_pending_interrupts` /
`get_pending_interrupts` helpers using GSI lookup + targeted UpdateExpression.
- `StreamCoordinator._extract_oauth_required_events` is now async and
persists each interrupt before yielding the SSE event; failures log but
never break the live stream.
- `get_messages_from_cloud` fetches pending interrupts in parallel.
- `/invocations` resume path clears resolved interrupts from metadata
after `agent.stream_async` completes.
- New `DELETE /sessions/{sid}/pending-interrupts/{iid:path}` endpoint
for explicit dismiss; colon-bearing Strands ids preserved via `:path`.
Frontend
- New `OAuthConsentPromptComponent` with a refined inline card design,
connector icon (admin base64 wins over heroicon, falls back to
providerType default), eyebrow/lock motif, primary gradient action
button, hover-revealed dismiss, fade+slide entrance.
- `MessageMapService.loadMessagesForSession` hydrates pending interrupts
on session load; anchors to triggering message id when present, else
the most recent assistant message.
- `OAuthConsentService.openConsentPopup` is async; lazy-fetches a fresh
authorization URL via `initiate-consent` when the stored one is absent
or expired (handles "already consented in another tab" by auto-resuming).
- `OAuthConsentService.dismiss` syncs to backend by default; completion
flow opts out so the resume path's own cleanup isn't double-fired.
- `MessageListComponent` renders unanchored interrupts at end-of-list as
a fallback for the "partial assistant message wasn't persisted" case.
- `awaiting_auth` derived tool status renders as a primary-blue ring on
the tool-rail dot instead of an indefinite amber spinner.
- `ChatRequestService.resumeFromOAuthConsent` accepts a fallback session
id (post-refresh case where `lastRequestObject` is null) and surfaces
400 `Unknown or expired interrupt ids` as a conversational error.
- Old floating `OAuthConsentBannerComponent` removed.
Known follow-up
- First-turn-of-a-new-session OAuth: persistence currently no-ops because
the session metadata row doesn't exist yet when the interrupt fires.
Tracked separately; sidecar item or upsert pattern is the likely fix.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Add functions to ensure session metadata existence and update session title and activity. - Implement logic for handling session activity updates, including message count increments and preferences merging. - Introduce deduplication for pending interrupts to prevent duplicate entries during session updates. - Update frontend components to reflect changes in session management, including OAuth consent prompts and message handling. - Refactor session service interfaces to use camelCase for consistency with backend responses. - Enhance tests for session activity updates, pending interrupts, and ensure proper handling of session metadata.
Resume after an OAuth-gated tool call only worked when the in-memory
agent cache still held the original turn. After a browser refresh the
frontend lost its request snapshot and the resume request landed with
no enabled_tools / model_id, so the inference API rebuilt a fresh agent
with an empty external-tool registry — the paused tool call had nothing
to resume against and the LLM responded that the tool wasn't available.
Resume contract now lives server-side. On pause, the stream coordinator
captures a ``PausedTurnSnapshot`` (enabled_tools, model_id, provider,
temperature, system_prompt, caching_enabled, max_tokens) onto the
session row alongside the existing ``pendingInterrupts``. On resume,
the inference API loads the snapshot and rebuilds the agent from it;
Strands' SessionManager then restores ``_interrupt_state`` from
AgentCore Memory, so the paused tool call picks up where it left off
regardless of cache hit/miss, refresh, or pod restart.
Frontend ``lastRequestObject`` snapshotting is gone — the resume
payload is now ``{ session_id, message: '', interrupt_responses }``.
Server-side snapshot has a 1h TTL; cleared on full turn completion
and at the start of any new (non-resume) turn.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…n't fail a turn Previously, ``load_external_tools`` cached newly-created MCP clients without verifying the server was actually reachable. A single connector that wasn't running locally (or whose endpoint was misconfigured) would sit in the registry and fail the whole turn the first time Strands called ``load_tools()`` on it. Pre-flight each new client immediately after construction. On failure, log a warning, skip the tool, and continue — the user keeps their other tools. On success the call also primes the client's tool cache, so Strands' later ``load_tools()`` becomes a no-op. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
| """ | ||
| user_id = current_user.user_id | ||
|
|
||
| logger.info("DELETE /sessions/%s/pending-interrupts/%s", session_id, interrupt_id) |
| """ | ||
| user_id = current_user.user_id | ||
|
|
||
| logger.info("DELETE /sessions/%s/pending-interrupts/%s", session_id, interrupt_id) |
| if not snapshot: | ||
| logger.warning( | ||
| "Resume rejected: no paused_turn snapshot for session %s", | ||
| input_data.session_id, |
| if expires_at and datetime.now(timezone.utc) > expires_at: | ||
| logger.warning( | ||
| "Resume rejected: paused_turn snapshot expired for session %s", | ||
| input_data.session_id, |
| detail="Paused turn expired; restart the turn.", | ||
| ) | ||
|
|
||
| caching_enabled = snapshot.caching_enabled |
ensure_session_metadata_exists() now runs unconditionally on /invocations and raises when DYNAMODB_SESSIONS_METADATA_TABLE_NAME is unset, breaking route tests that mock the agent and skip DynamoDB. Stub it via an autouse fixture so route tests exercise the route, not the persistence layer. Also patch the new get_pending_interrupts call in the cloud-message tests. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The disconnect flag lived in a module-level set inside the inference API process, so a /disconnect on one replica was invisible to any other. Under multi-replica deploys the user could see "Connected" on one request and "needs consent" on the next, and the AfterToolCallEvent 401-retry path likewise lost its intent on replica fan-out. Move the per-(user, provider) disconnect flag to a new OAuthDisconnectRepository on the existing oauth-user-tokens DynamoDB table (already provisioned, KMS-encrypted, with R/W IAM granted to the inference API). The token cache stays as a hot-path L1 for tokens only; the consent hook reads the disconnect repo on every BeforeToolCallEvent so a disconnect anywhere is honored on the next tool run anywhere. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…wlist The frontend posts an `OAuth2CallbackUrl` header on every consent-related request, and the inference-api middleware was forwarding it verbatim into `BedrockAgentCoreContext`. An authenticated user could pivot the OAuth redirect to an attacker-controlled origin and capture the authorization code on consent. Reuse `CORS_ORIGINS` as the trust boundary, pin the path to `/oauth-complete`, and reject non-http(s) schemes, query strings, and fragments. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
A misconfigured provider (wrong scope, perma-401) would otherwise spawn
a fresh consent prompt on every tool call in a turn: the per-tool-use
retry guard reset for each new toolUseId, so the model could trigger
prompt-after-prompt with no upper bound. Track attempted providers on
the hook itself, reset on `BeforeInvocationEvent` (fires per turn,
including resume), so the user sees at most one consent prompt per
provider per turn before 401s flow through to the model.
Also clarify the `event.interrupt(name="oauth:{provider_id}")` comment:
the SDK's BeforeToolCallEvent._interrupt_id folds in `toolUseId`, so
parallel tool calls to the same provider already produce distinct
interrupt ids. New regression test pins that invariant.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
A stream replay after refresh, or a late server-side breadcrumb clear, could fire the same `oauth_required` event again after a successful consent or explicit dismissal — and the prompt would resurrect because provider-keyed dedup re-added the entry. Track seen interrupt ids on the consent service so already-resolved interrupts stay gone for the session. New tool calls always carry a fresh interrupt id (Strands generates it from `toolUseId`), so legitimate prompts are never suppressed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
| return | ||
| result = self._mark_disconnected(provider_id) | ||
| if inspect.isawaitable(result): | ||
| await result |
… stack The referenced tables live in InfrastructureStack (moved there to break a prior circular dep); update 9 SSM-read comments to match. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Replaces the in-house OAuth flow for external MCP tools with AgentCore Identity end-to-end, and switches the per-turn consent gate from a pre-flight short-circuit to mid-turn Strands interrupts so users don't have to retype their prompt after authorizing.
CreateOauth2CredentialProvider+ friends); our DynamoDB record keeps display/RBAC metadata only./oauth-completecallsCompleteResourceTokenAuthbefore notifying the opener so the vault doesn't stay empty.OAuthConsentHook(BeforeToolCallEvent+AfterToolCallEvent) gates tool execution. First call → AgentCore vault hit or interrupt with consent URL. Stale token after a provider-side revoke →AfterToolCallEventdetects the 401, marks the user/provider forforce_authentication, setsretry=True, and the nextBeforeToolCallEventraises a fresh interrupt.oauth_requiredSSE events now carryinterruptId; the frontend snapshots the last turn's payload and replays it withinterrupt_responsesafter the popup closes, so the agent picks up the same turn without a retype.Test plan
/admin/oauth-providers; confirm AgentCore credential provider mirrors changes.AfterToolCallEvent→ retry → fresh consent → resume).uv run python -m pytest tests/agents/main_agent/integrations tests/agents/main_agent/session/test_oauth_consent_hook.py tests/agents/main_agent/integrations/test_oauth_token_cache.py— should be green.npm run test:ci— should be green.🤖 Generated with Claude Code